almasriosama_94996_8158150_Report4_Almasri.pdf
  • Name : Osama AlMasri Internship Organization : WestRock Company Mentor / Preceptor 's Name : Mitesh Patel Dates of 2-Week Period Covered : 3/9/2020-3/20/2020
  • Current Tasks : My project will be within the Human Resources ( HR ) group in WestRock company .
  • In addition to that , I worked on multiple side projects that was identified by the business .
  • Task 4 is to identify and acquire additional data if needed .
  • Specific steps / progress : Tasks 1 , 2 and 3 progress : Performed additional data quality checks in R. In addition to that , the dataset included many different “ actions ” such as hires , terminations , promotions…etc that were not necessarily related retirement .
  • Task 4 progress : At this point , no new data sources were included , however ; I was able to add additional features using the existing data such as current age which we think that is necessary to include in the prediction model since that usually people retire when they become older .
  • Task 5 progress : Race , Gender , Type , Country , Smoker , Disabled categorical variables to be included .
  • Task 6 progress : Used R to start building a prediction model using H2O package .
  • Task 7 progress : I was assigned a project to enhance and existing dashboard in QlikView that looks at multiple metrics/ KPI s used by the Chief Human Resources Officer at the organization .
  • In addition to that , I was able to build an initial data model for another dashboard that is intended to look at the employee sentiment using the different channels offered by the company .
bapatanjali_126445_8134209_Progress Report 4.pdf
  • Specific steps / progress Task 1 : Searching for API that can-do route optimization .
  • Task 2 : Using API to randomize the streets and create an optimized route .
  • However , we had more than 100 stops that driver takes and device captures the images for that 100 points .
  • I used requests in python and created script that reads csv and takes longitude , latitude coordinates and increments its index for destination .
  • This request creates a json output which has sequence number attached to each waypoint and destination .
  • I take this output and then this data is uploaded to a simple web-app which is linked with google maps .
  • This app is then utilized by the driver to know how to drive the car , it means which route to take .
  • Picture below displays sequence that driver should take for Wilmington , NC
  • We want to onboard few cities but we could not find relevant data on internet .
  • Learnt using QGIS for visualizing streets data and how we can impose different layers on one another .
bartonronald_LATE_125819_8169488_Progress Report 4.pdf
  • I continued to finish up with the sqft feature engineering assignment I was working on in the previous progress report .
  • Most importantly , we had our second sprint review , where we present our findings and results to our boss and others within the data science team .
  • Most of the first week of this period was spent finishing my feature engineering work and preparing the slides for the sprint review .
  • Whenever the difference between the two was greater than 100 sqft it picked the third party since it was supposed to be from tax data and could be assumed to be more accurate .
  • Another more complex example was to first create a dictionary in python that aggregated the average sqft per number of bathrooms across all the data .
  • Whenever the difference was greater than 400 between internal and third party it would fill in the sqft based on the dictionary value that matched the records number of bathrooms .
  • Finally these new features are run through a function that generates a univariate plot to see how their exposure on the various insurance perils .
  • At the sprint review I presented my features and the univariate plots plus my other frequency/severity testing and received very good feedback which was great to hear .
  • I find git somewhat confusing still , but with the help of the other intern through skype and screen sharing I was able to create the request to be reviewed and then pushed to the main branch .
  • Also even in the world of tech and coding there are still some grunt work tasks like individually labeling our variables that need to be performed .
cantyjeremiah_LATE_35429_8168370_Report4_Canty.pdf
  • Name : Jeremiah Canty Internship Organization : UNC Charlotte Mentor / Preceptor 's Name : Dr. Doug Hauge Dates of 2-Week Period Covered : 3/7/2020 – 3/20/2020
  • So , we need to analyze the run time data and compare first .
  • Task I progress : I first had to research through Matplotlib , coworkers , Skit learn , and trial and error to configure the visualization to the specific results I wanted and to analyze and filter the data set .
  • Task II progress : I made visualizations now that show a histogram of the data within python and show the overall progression of runners throughout their high school career .
  • Also shortened the weeks of the data to only display school year of sport season .
  • I also had to learn how to overcome and the certain grouping techniques needed for a chart I wanted .
  • Task I outcomes : Manipulated the data to create more detailed visualizations .
  • I have edited and created a chart that display the regression of boy s athletes throughout the running season .
  • Task II outcomes : In the graphs showing the is in python and shows the amount of people we have per year , based on how many years they ran and their avg minutes .
  • Task III outcomes : I created insightful visualizations that display relationships .
cardenasarturo_27332_8147345_Report4_Cardenas.pdf
  • Due to the virus Covid-19 , Mecklenburg County has beginning to reduce staff at public offices , sending most people to work from home .
  • The project was supposed to be coordinated by Keith Bailey , who works for Mecklenburg County , but was not available at the moment .
  • I had an initial meeting on Monday March 16th , I was able to demo the RedCap web entry screens , as well as the report .
  • Just in case the original project does not come to fruition , I asked Dr. Dulin , who is my internship sponsor and mentor , to see if I could work on a 2nd project using open source datasets , as long as I keep using R-Shiny Dashboards and data science activities .
  • Any chosen project will have an R-Shiny Dashboard , as well as data science activities After submission to Dr.Dulin , I will start working on this next week .
  • Task III : decide which data science activities , most likely cluster analysis , logistic regression , random forest on a predictable health progress variable .
  • Outcomes – At this point the original project is in suspense , will try to follow up next week with Mr. James Walker and Dr.Dulin .
  • In the meantime , I will start working on a 2nd parallel project , in case the original does not come to fruition
  • In the meantime , it is important to keep moving , I will have 2 parallel projects , it will be more work for me , but I will do my best to have at least one finished by the end of April
  • ( c ) Start working on the data science activities , most likely cluster analysis , logistic regression , random
copeblake_LATE_23876_8160837_Progress Report 4.pdf
  • Name : Blake Cope Internship Organization : Sports Business Journal Mentor s Name : Derick Moss Dates of 2-Week Period covered : 3/9/20 – 3/20/20
  • I am continuing my work with the Sports Consumer research data .
  • The survey focuses on how sport fans viewing habits for each of the major professional leagues ( MLB , NBA , NFL , NHL , and MLS ) .
  • It should be noted that due to the coronavirus pandemic , I have started working from home .
  • Task 1 : In order to deal with overfitting I wanted to check on the collinearity of the variables I was working by viewing their VIF factor .
  • I created a heat map to show the correlation of the remaining variables to consumers interest in MLB .
  • Other visualizations that I plan on making are bar charts and x , y charts to show how different questions affect interest in MLB and then repeat the progress for other sport leagues .
  • Trying to filter out variables that weren ’ t correlated and had high collinearity proved to be a little challenging at first as it is not something that I am used to doing .
  • I am still maintaining constant contact with my team by virtually attending daily meetings at 9:30 am and with one of the data analysts at 1:30 pm .
  • Continue make visualizations for exploratory analysis of interest in MLB and
demirelif_126087_8156756_Report4_Demir.pdf
  • I clean and prepare the data for initial experiment with 20K instances Then , I created multiple models to see the insights from the dataset and to measure the performance .
  • I was planning to include the states of advisors as a categorical variable to sort the importance in the perspective of increasing our recruitment effect .
  • Since predicting the likelihood of joining our firm , and not specified the geo location the address variables removed from the dataset .
  • Additionally , the contact information including the mail address and phone numbers removed from the dataset .
  • There were some missing values whether they are broker dealer executive or not , insurance agent or not , investment advisor representative or not ; those were recoded as no cause of the dataset created in that way and after the discussion with marketing manager at IP .
  • Task2 : I created three different models to measure the accuracy including logistic regression , decision tree and random forest .
  • Cleaning data is very important for building a model such as imputation , removing NA , log transformations steps .
  • The model suggested that males , producers , age 1960+ , being insurance agent , being investment advisor and being broker dealer executive are more likely respond our recruitment effort comparing to females , non-producers , age 1960- , not an insurance agent , investment advisor and not broker dealer executive .
  • Second suggestion by model was the number of licenses , state registered is significant while predicting the recruitment effort and increases the likelihood of response rate .
  • Third suggestion by model was the number of exams years and employed as registered representative is significant but
duttaroma_117177_8160719_ProgressReport4_dutta.pdf
  • Dates of 2-Week Period Covered in the Progress Report : March 30th April 6th
  • As part of my task for this week , I did concentrate on generating DTM of the corpus we are interested in .
  • A DTM is basically a matrix , with documents designated by rows and words by columns , that the elements are the counts or the weights ( usually by tf-idf ) .
  • Afterwards , based on my findings I had to categorize the text into a set of defined categories .
  • We have carried out generation of DTM to identify the key attributes from the text that customer provided as complaints .
  • This has helped us identify the top five categories that result into dispute among wells fargo customers .
  • A sentiment analysis / text mining was carried out that represent loan related disputes as the primary reason for customer dissatisfaction
  • This helps in making the assumption that each new complaint is assigned to one and only one category .
  • A decision was made to use R package and IBM SPSS Text analytics
  • Better understanding of different categories of complaints ▪ Pros and Cons of different text mining tools ▪ Understood that the loan related queries are the major source of customer complaints
gargdivya_LATE_116438_8173034_Progress Report 4.pdf
  • Name : Divya Garg Internship Organization : Open Data Nation Mentor / Preceptor 's Name : Carey Anne Nadeau Dates of 2-Week Period Covered : 9th Mar-20th Mar
  • For matching crashes with road network , we used the results from the exposure model and selected variables from the crash files as well as other data that was provided by the data analytics team .
  • For State Task 4 , I used the results obtained from the exposure model , and collected data in
  • workable format , selecting features of interest as guided by the Team lead .
  • Data files were huge some of which had even 9 million observations for 1 particular year and I
  • All the required tasks were done on the cloud to overcome storage issues and improve the
  • I was able to merge all the data files using matching algorithm and new target variable for crashes was created .
  • Worked on matching algorithm , which was created to merge the data files .
  • Matched crashes with road network file obtained from exposure model .
  • I will be working on merging newly created through table with weather data file .
gulleyalexander_117139_8149021_Report4_Gulley.pdf
  • o The largest breakpoint was accounted for and helped make the time series more stable .
  • - Met with stakeholders / subject matter experts
  • o Discussed the model initial results with stakeholders .
  • o Discussed possible exogenous variables to include .
  • making this adjustment , the trend component was considered to be larger than it should have been and produced models that were wildly inaccurate .
  • The change damped the trend component down and allowed for a better fitting model .
  • Still need to decide whether a grid search of Orders for Arima or use auto.arima function to determine the parameters is best .
  • One model that seems promising is a Holt Winters Multiplicative ( with Damped Trend possibly ) .
  • It appears that a log transform ( Box-Cox with Lambda = 0 ) should improve the model to help shift the residuals to be more normally distributed .
  • o Verify results o Discuss output formats o Talk about integration
hakasmaggie_127066_8155691_ProgressReport4Hakas.pdf
  • Name : Maggie Hakas Internship Organization : The Hartford Mentor / Preceptor 's Name : Heather Grebe/Lane Coonrod Dates of 2-Week Period Covered : 3/9-3/20
  • Task I was looking into what variables in the internal dataset are telling the same story , and should likely be removed .
  • Task II is testing the new model notebook for errors prior to being put into production .
  • Luckily , the company has been very helpful in making going remote a seamless experience .
  • The workbook we created is organized in a way that not only will it be useful for the next part of the project , but it is a readable dictionary for others to use when working with internal variables .
  • The backend code is made up of many nested functions and loop comprehensions , which were things I needed to better understand .
  • Task III outcomes : At first , working from home started out extremely difficult .
  • We had to take off the video chat function for our skype conferences to help with us not losing wifi , however there has been a new team formed that has made things a little more fun .
  • ( a ) Produce univariates on internal variables and make decisions regarding the results
  • ( b ) Move into hyperparameter research to see how max depth and other pieces affect model results
kishorekumarsudha_95533_8156904_Progress Report_03202020.pdf
  • Name : Sudha Kishorekumar Internship Organization : CVS Health Mentor Name : Lisa Klein Dates of 2-Week Period Covered : 03/09/2020 – 03/20/2020
  • There are a lot of steps involved in creating membership tables .
  • Data in then summarized based on HEDIS reporting entities .
  • Downstream membership reports and tables are generated from the summarized data .
  • Tables needed for membership reporting are created in SAS .
  • SAS code is then scheduled in Tidal so that it automatically executes on the 11th of every month .
  • I received instructions from the Tidal administrator on all the required documentation that needs to be completed for scheduling the SAS code .
  • Development of SAS code to create membership tables is underway .
  • Identified a few plan carrier arrangement ids that were not mapped to a line of business .
  • Working with the business to determine the mapping for these arrangement ids .
laixinxin_121407_8158341_Report4_Lai.pdf
  • preprocess the data and build baseline models of Logistic Regression and Random Forest first .
  • 1 ) I have presented the plots of correlations to them remotely this week on Google meet .
  • I have explained to them if there is correlation among different features , whatever we predict will not that reliable and we need to do something to get rid of that .
  • As a matter of fact , PCA is a good way to fix this or we can create proxy .
  • In this , I will conduct PCA on the two groups of highly correlated features in my previous study which are Deter_Paper vs Grocery , and Frozen vs Fresh ( as shown below ) :
  • However , if we need to compare the model reliability , want to see the sensitivity and variables importance ranking , the tuned Random Forest would achieve a better job with the accessibility of Kappa and varImp function and a higher sensitivity of 93.26 % .
  • And if we take a second thought , it would be intuitive that the biggest difference between class 1 ( Hotel/Restaurant/Café ) and class 2 ( Grocery ) is that groceries stores will purchase many kinds of detergents and paper in order to provide varieties of choices for their consumers while class 1 will tend to purchase only one or two kinds of detergents and paper they get used to us .
  • 2 ) When interpreting the result and the metrics , it is essential to inquire the domain knowledge
  • and the context of the business from the experts in order to gain better understanding and validation of the model result .
  • 1 ) Try to group conceptually related variables and then conduct PCA , then rebuild the Logistic
muchajulian_1892_8158994_Report4_Mucha.pdf
  • Name : Julian Mucha Internship Organization : Compass Group USA , Inc. Mentor / Preceptor 's Name : Nicholas Greene Dates of 2-Week Period Covered : March 6-March 20
  • During this time working from home , I am primarily focused on coding the linear programming model in Python using the PuLp package .
  • Weekly meetings have been eliminated due to not working in the office but I have been providing updates to my manager via email and asking any questions that I may have .
  • My manager suggested a simple base case that would be used as a test for the code and to see if the model could in fact find an optimal solution to a simplistic problem .
  • From the work I have conducted in the past two weeks , I have formulated and implemented a linear programming model in Python that solves the basic case as mentioned before .
  • I feel that I have been a lot more productive and have made major strides toward solving the larger labor scheduling optimization problem .
  • I have learned a great deal about efficient methods of handling data in Python using dictionary and list comprehension as well as navigating through the PuLp package to devise a model .
  • My plan for the next two weeks is to continue with plotting out a mathematical formulation to the larger and more complex version of the labor scheduling problem .
  • Once I am able to fully flesh out a mathematical formulation that could potentially be generalized to any cost center which may have their own individual constraints included , I can see about writing code that makes it easy for a user to generate the solution they need without any hassle .
  • I hope to accomplish this goal in the coming weeks and potentially wrap this project up in time for final presentations .
paulkabita_126534_8131753_Progress Report 4.pdf
  • We have started analysis on cancer patients ’ data to analyse various factor causing severe and chronic pain after surgery .
  • We figured that health article recommendation is one of the important features of our Web and mobile dashboard which can be addresses through machine learning point of view .
  • Task II : Analyse collected data and build article recommender system .
  • Task I progress : This project of building customized interactive tool requires high level understanding of healthcare system .
  • We collected health advice articles online to build our initial data base .
  • We planned to create a hybrid model i.e. a combination of content based and collaborative filtering method .
  • However , as we are designing a new system , we faced cold start problem while building our model which means we do not have enough data to find correlation between several users ’ choices .
  • In the original data collection study , patients were divided into three control groups based on the anaesthesia types provided .
  • Task II outcomes : We did initial analysis and in process of building recommender engine .
  • ( b ) Come up with a basic article recommender engine for our health application ( c ) correlation analysis between variables of cancer patient dataset ( d ) Find more relatable data sources and analyse .
penmathsamanideepvarma_127089_7845505_Report4_Penmathsa.pdf
  • Name : Manideep Varma Penmathsa Internship Organization : Wells Fargo Mentor / Preceptor 's Name : Chris Guella Dates of 2-Week Period Covered : Jan 27th Feb7th
  • Roll back to UAT and re-design , add new things , make changes as necessary from the new requirements if needed .
  • Task III : Start the transition of taking ownership of Python optimizer model .
  • Finally , continue to build more tableau dashboards for the team wherever applicable .
  • Task I Progress : During the seventh week , I deployed the tableau dashboards with Day 0 data and Repo amounts , sales amounts , and various other liquidity product code data from Day 1 onwards on to the PROD environment .
  • Specifically , tableau webserver to deploy the PROD dashboards is under the url tableau.wellsfargo.com .
  • Task I Outcomes : During this period , I successfully published Tableau dashboards on to the production server .
  • Tasks such as maintaining dashboards , rolling them back into UAT for any fixes/re-designs , maintaining the python optimizer model code , making any changes from the new requirements , fixing bugs , etc. , need to be performed through out the year or more .
  • This continuous cycle of learning and fixing issues will extend beyond this internship scope .
  • Learned about maintaining code , fixing bugs , linear optimization etc .
richterlyndsay_LATE_4676_8164319_Report4_Richter.pdf
  • Name : Lyndsay Richter Internship Organization : UNC Charlotte Student Affairs Research and Assessment ( SARA ) Mentors : Dr. Erin Bentrim and Dr. Ellissa Brooks Nelson ( SARA ) Dates : March 10 March 20
  • Our division , with the rest of the University , has scrambled to manage an influx of communication messaging and operational changes , including the relocation of thousands of residential students , reduced hours and closings of student life facilities , the implementation of remote/teleworking arrangements , and the challenge of continuing critical student services , related to both physical and mental health as well as academic functions .
  • Task I : Reviewed and made minor edits to the three SASS interactive dashboards .
  • Task II : Observed and analyzed the data visualizations produced and/or published related to the coronavirus from the following digital outlets :
  • Task I : The departmental presentation has been postponed indefinitely due to the coronavirus impact ; however , I will be moving forward with transitioning this information to a format for electronic delivery ( likely a Tableau Story or PowerPoint ) .
  • Task II : A variety of maps using geospatial data are the most common visualization being used to communicate the scope of the coronavirus .
  • Bar charts – some good , some bad seem to be the next most common delivery method of coronavirus information .
  • These are used for information related to the outbreak , but also for comparing coronavirus to other diseases or viruses , such as the flu .
  • Reconnect with members of the Veteran Students Services office to discuss opportunities for
  • data analysis ; depending on outcome , redirect internship efforts to focus on other areas .
serapinzach_29510_8155752_Report4_Serapin.pdf
  • I was pulled in a different direction this week and has some time taken away from my individual project .
  • I was asked to complete a few tasks relating to the banks CCAR submission .
  • While the work was business critical , it was very repetitive and lacked direct data science application .
  • I was also asked to refresh my report on negative rates to account for two additional conditions .
  • The lists of up and downstream connections were stored as lists in my column values , which lead me to write a for loop that iterated through all rows and generated values of the specific models that were both added as connections and dropped over each month .
  • After some more simple data manipulation I started generating summary tables and visualizations to help paint a better picture of what is going on .
  • It may be fair to assume that early on in my career I will be asked to take data and consistently manipulate it ways that makes it convenient , easy or possible to analyze .
  • While it may be a little more glamorous to building fancy machine learning models , the core of data science is sometimes taking data and framing it in ways that allow us to ask questions and then develop answers .
  • My mentor has given me a few steps and ideas to help begin translating the work into an end product .
  • I ’ m also going to be doing some research into inverse guasion curves , as they can help asses and determine a model s maturation as it evolves in the network .
singaravelmurali_110276_8144796_Report4_Singaravel.pdf
  • Muralidharan Singaravel Student Id - 801 059 720 Spring 2020 DSBA Internship Progress Report 4
  • Current Tasks : My internship project is titled “ Customer Complaint Analysis Using Machine Learning and the main objective of the project is to analyze customer complaint data and answers questions to leadership by identifying key and emerging trends , volumes , themes and insights which will help in root cause analytics development .
  • As the result of the global pandemic COVID 19 , I have been asked to work remote and help out on a different project to access the impact of COVID-19 on Wells Fargo customers .
  • Outcomes – The visualization of customer complaints data helped to interpret the analysis easily .
  • The outcome of the COVID-19 analysis will be a daily presentation deck to the executive giving insights like volume of COVID-19 related complaints since the outbreak , identified current emerging trends like customer expressing financial hardship or cancelled travelled plans and current complaint case attribute and taxonomy .
  • 2 Week Plan For the next two weeks the next step is to develop and test models on various machine learning algorithms for classification - to classify consumer complaints into predefined categories and regression - to predict the reasons of customer complaints .
  • Muralidharan Singaravel Student Id - 801 059 720 Spring 2020 DSBA Internship Progress Report 4 Some visualizations from publicity available customer complaints dataset against Wells Fargo .
  • Muralidharan Singaravel Student Id - 801 059 720 Spring 2020 DSBA Internship Progress Report 4 3 .
  • Opening/Closing an account , struggling to pay mortgage , problem with purchase , and trouble during payment process are on the top 10 issues in the data .
  • They could potentially be solved with better website and mobile app design , or better training for the bank associates that interact with customers .
summeykelsey_1592_8157471_Progress Report 4.pdf
  • Title : Economic Development Profit & Loss Project
  • Weekly Meetings with Vice President b. Check-ins with our Directors c. Check-ins with our Team Members d. Check-ins with Large Account Management e. Weekly Meetings with Data Scientists f. Conduct predictive analytics
  • What types of projects ( industry ) have a steady decline in revenue ?
  • My weekly meetings with my Vice President are to ensure I am staying on top of my tasks
  • This includes Account and Service Point Numbers ii .
  • Conducting predictive analytics will have to be added to my next progress report .
  • Being that everyone is working from home and only essential employees can be on our VPN at certain times , many of my meetings got cancelled .
  • The Directors and I have scheduled trainings , and this has led to most of the team learning
  • Large Account Management meetings always allow me to increase the accuracy of my data .
  • completely , and I hope I ’ ll be able to get back on track by the presentation date .
tomasikmarie_126529_8158318_Progress_Report_4.pdf
  • Name : Internship Organization : Mentor 's Name : Dates of 2-Week Period Covered :
  • Task 2 was to present the initial model to the stakeholders .
  • I finished the initial model to predict unionization using the top hypotheses identified by the project stakeholders .
  • I presented the significant factors that went into the final model for the stakeholders .
  • I made recommendations on which plants may be most at risk of unionization based on the findings with testing the initial hypotheses .
  • In our meeting the stakeholders still felt many of the original hypotheses that we came up with were still worth testing , so I am working on gathering that data .
  • They also had a new theory that plants with weekend and/ or night shifts may be more likely to unionize , so that will be added to the list of theories to test .
  • The model was finished and we found that the engagement survey data was the only thing significant of the top hypotheses .
  • From this discussion I will move forward gathering data to test our remaining theories as well as the shift theory and any others they send to me as they think of them .
  • Gather data for next hypotheses ● Test next hypotheses ● Test model with new BLS data from 2019
vavilalasrivan_19848_8158870_Internship - Progress Report #4.pdf
  • Name : Srivan Vavilala Company : Vishion Mentor : Gurtej Singh Work Period : Mar .
  • The main goal at the moment is to research the problem that we ’ re facing and note if it ’ s ever been solved before .
  • Although it ’ s tedious , I ’ ve grown to appreciate the importance of understanding the problem from as many sides as possible before jumping to solve it .
  • Additionally , there are datasets that are going to be subject to the categorization algorithm that we ’ re going to be writing .
  • Therefore , I ’ ve been doing basic excel and SQL-based exploratory analysis on that dataset in order to understand what it ’ s trying to tell us at its core .
  • I ’ ve discovered that there aren t any real connections in the way that each of the products being categorized are named/tagged .
  • that there are secondary/tertiary pairs of eyes that look over it to reduce the number of potential logical errors .
  • ❖ Although it ’ s not the most interesting of all Data Science areas of work , business
  • intelligence can be the most important piece of the puzzle for companies looking to break into the mainstream data science .
  • Research other solutions to the same/similar problem in order to avoid wasting time
vegesnakovidh_31534_8159051_Report4_Vegesna.pdf
  • Name : Internship Organization : Mentor / Preceptor 's Name : Dates of 2-Week Period Covered :
  • The first step was trying to identify how to input the data into the functions .
  • There are some sections of the code which I am unsure of how it would apply to the problem we are trying to solve .
  • Lo and her team to better understand the problem they are facing and clarify other questions along the way .
  • I looked for possible packages and libraries in R commonly used for this type of problem .
  • The second task was organizing all the data files and code on Github .
  • A major part of a project is making sure all the necessary files and documents are organized properly .
  • I uploaded all the code , files , and data onto a separate branch on Github .
  • There are a lot of other R packages that are used for DNA sequence analysis .
  • The other thing is to work on getting the setup for the R package started .
xiachunqiu_116382_8149885_Report4_Xia.pdf
  • Internship Organization : University of North Carolina , Belk College
  • The current project is to explore the customers ’ behaviors in Yelp dataset .
  • In these two weeks , my work is to find ways to add consistent time intervals in the dataset and try to use a linear mixed model to predict time-varying effects .
  • Task I : Last time , I fit the time-varying effect model by coxph , but the problem is that I didn ’ t have the expected variables as our response .
  • The second step is to combine year and month to form an actual quarter , then fit time_id to each observation .
  • For Task II : The model is fitted well , but the problem is to draw graphs .
  • If I want to draw two or more graphs of coefficients of covariates , there is always a bug , and I will try to fix it in the future .
  • I have learned LME in undergraduate school , but haven t used it for a long time .
  • Improve the LME and try to draw multiple line of each coefficient .
  • b ) Try to draw a conclusion of TVEM including model estimation and interpretation of
xiaodiwen_117612_8159478_Report4_Xiao.pdf
  • Name : Diwen Xiao Internship Organization : UNC Charlotte Mentor / Preceptor s Name : Professor Ming Chen Dates of 2-Week Period Covered in the Progress Report : 03/07/2020-03/20/2020
  • Current Tasks During this time , for the reason of the COVID-19 emerges and many of workers required work from home , the process of our project has been slow down .
  • Specific Steps/Progress Task I : For the reason we propose to use YOLO model to detect AOIs in dynamic eye-tracking data and then combine the detected AOIs with the mapped coordinates of eye gaze data .
  • So that we can use the trained YOLO model to detect the focused AOIs in all the collected eye- tracking videos for the further analysis .
  • Task II : I have read the three researches and references include “ The Determinants of Web Page Viewing Behavior : An Eye-tracking Study , Information Acquisition during Online Decision Making : A Model-based Exploration using Eye-tracking Data , and Eye Tracking Reveals Processes that Enable Conjoint Choices to Become Increasingly Efficient with Practice .
  • Then we can use the trained model to detect the focused AOIs in all the collected eye- tracking videos for the further analysis .
  • Task II outcomes : I wrote a summary for these three references related with traditional eye tracking studies and researches .
  • For “ Information Acquisition during Online Decision Making : A Model-based Exploration using Eye-tracking Data , the researchers want to apprehend and to ascertain the pattern of information acquisition for attribute by product matrices that are frequently used in online choice environments by using a hierarchical hidden Markov model of eye-tracking data .
  • And for Eye Tracking Reveals Processes that Enable Conjoint Choices to Become Increasingly Efficient with Practice , it discovers the evidences of how eye-tracking devices can increase choice-based conjoint exercises effectiveness , efficiency and reliability .
  • 2 ) Learned how to quickly summarize and generate summary based on the researches that I have read .